2 research outputs found
DSPatch: Dual Spatial Pattern Prefetcher
High main memory latency continues to limit performance of modern
high-performance out-of-order cores. While DRAM latency has remained nearly the
same over many generations, DRAM bandwidth has grown significantly due to
higher frequencies, newer architectures (DDR4, LPDDR4, GDDR5) and 3D-stacked
memory packaging (HBM). Current state-of-the-art prefetchers do not do well in
extracting higher performance when higher DRAM bandwidth is available.
Prefetchers need the ability to dynamically adapt to available bandwidth,
boosting prefetch count and prefetch coverage when headroom exists and
throttling down to achieve high accuracy when the bandwidth utilization is
close to peak. To this end, we present the Dual Spatial Pattern Prefetcher
(DSPatch) that can be used as a standalone prefetcher or as a lightweight
adjunct spatial prefetcher to the state-of-the-art delta-based Signature
Pattern Prefetcher (SPP). DSPatch builds on a novel and intuitive use of
modulated spatial bit-patterns. The key idea is to: (1) represent program
accesses on a physical page as a bit-pattern anchored to the first "trigger"
access, (2) learn two spatial access bit-patterns: one biased towards coverage
and another biased towards accuracy, and (3) select one bit-pattern at run-time
based on the DRAM bandwidth utilization to generate prefetches. Across a
diverse set of workloads, using only 3.6KB of storage, DSPatch improves
performance over an aggressive baseline with a PC-based stride prefetcher at
the L1 cache and the SPP prefetcher at the L2 cache by 6% (9% in
memory-intensive workloads and up to 26%). Moreover, the performance of
DSPatch+SPP scales with increasing DRAM bandwidth, growing from 6% over SPP to
10% when DRAM bandwidth is doubled.Comment: This work is to appear in MICRO 201